8 research outputs found
MSM/RD: Coupling Markov state models of molecular kinetics with reaction-diffusion simulations
Molecular dynamics (MD) simulations can model the interactions between
macromolecules with high spatiotemporal resolution but at a high computational
cost. By combining high-throughput MD with Markov state models (MSMs), it is
now possible to obtain long-timescale behavior of small to intermediate
biomolecules and complexes. To model the interactions of many molecules at
large lengthscales, particle-based reaction-diffusion (RD) simulations are more
suitable but lack molecular detail. Thus, coupling MSMs and RD simulations
(MSM/RD) would be highly desirable, as they could efficiently produce
simulations at large time- and lengthscales, while still conserving the
characteristic features of the interactions observed at atomic detail. While
such a coupling seems straightforward, fundamental questions are still open:
Which definition of MSM states is suitable? Which protocol to merge and split
RD particles in an association/dissociation reaction will conserve the correct
bimolecular kinetics and thermodynamics? In this paper, we make the first step
towards MSM/RD by laying out a general theory of coupling and proposing a first
implementation for association/dissociation of a protein with a small ligand (A
+ B C). Applications on a toy model and CO diffusion into the heme cavity
of myoglobin are reported
Temperature steerable flows and Boltzmann generators
Boltzmann generators approach the sampling problem in many-body physics by combining a normalizing flow and a statistical reweighting method to generate samples in thermodynamic equilibrium. The equilibrium distribution is usually defined by an energy function and a thermodynamic state. Here, we propose temperature steerable flows (TSFs) which are able to generate a family of probability densities parametrized by a choosable temperature parameter. TSFs can be embedded in generalized ensemble sampling frameworks to sample a physical system across multiple thermodynamic states
Diffusion-influenced reaction rates in the presence of pair interactions
The kinetics of bimolecular reactions in solution depends, among other
factors, on intermolecular forces such as steric repulsion or electrostatic
interaction. Microscopically, a pair of molecules first has to meet by
diffusion before the reaction can take place. In this work, we establish an
extension of Doi's volume reaction model to molecules interacting via pair
potentials, which is a key ingredient for interacting-particle-based
reaction-diffusion (iPRD) simulations. As a central result, we relate model
parameters and macroscopic reaction rate constants in this situation. We solve
the corresponding reaction-diffusion equation in the steady state and derive
semi-analytical expressions for the reaction rate constant and the local
concentration profiles. Our results apply to the full spectrum from well-mixed
to diffusion--limited kinetics. For limiting cases, we give explicit formulas,
and we provide a computationally inexpensive numerical scheme for the general
case, including the intermediate, diffusion-influenced regime. The obtained
rate constants decompose uniquely into encounter and formation rates, and we
discuss the effect of the potential on both subprocesses, exemplified for a
soft harmonic repulsion and a Lennard-Jones potential. The analysis is
complemented by extensive stochastic iPRD simulations, and we find excellent
agreement with the theoretical predictions
UNICON: A unified framework for behavior-based consumer segmentation in e-commerce
Data-driven personalization is a key practice in fashion e-commerce,
improving the way businesses serve their consumers needs with more relevant
content. While hyper-personalization offers highly targeted experiences to each
consumer, it requires a significant amount of private data to create an
individualized journey. To alleviate this, group-based personalization provides
a moderate level of personalization built on broader common preferences of a
consumer segment, while still being able to personalize the results. We
introduce UNICON, a unified deep learning consumer segmentation framework that
leverages rich consumer behavior data to learn long-term latent representations
and utilizes them to extract two pivotal types of segmentation catering various
personalization use-cases: lookalike, expanding a predefined target seed
segment with consumers of similar behavior, and data-driven, revealing
non-obvious consumer segments with similar affinities. We demonstrate through
extensive experimentation our framework effectiveness in fashion to identify
lookalike Designer audience and data-driven style segments. Furthermore, we
present experiments that showcase how segment information can be incorporated
in a hybrid recommender system combining hyper and group-based personalization
to exploit the advantages of both alternatives and provide improvements on
consumer experience
Enhanced sampling methods for molecular systems: multiscale and data-driven techniques
Simulations of molecular systems have led to significant discoveries in molecular biology.
The high accuracy of these simulations enables us to understand biological functions on a
molecular scale.
In connection with experimental results, they have proved to be a powerful tool to investigate
biological functions.
While the applications for such simulations are countless, in practice it is only possible to
simulate small systems due to computational limitations; reaching biologically relevant time- and
length-scales is still beyond feasibility, even for the most powerful computers.
This constraint is commonly known as the sampling problem.
With the progress in hardware development slowing down, demand for new methods that enable
reaching relevant scales is high.
This thesis aims to provide new tools that help molecular simulations reach biologically relevant scales. It
is split into two parts:
The first part provides new methods for rate computations in reactive systems, which can consist
e.g. of a protein-ligand binding, oligomerization, or protein-protein
association.
The first method combines Markov state models of
molecular kinetics with particle-based reaction-diffusion (PBRD) to generate a coarse-grained
simulation of interacting molecules.
This method conserves the characteristic kinetics of the interactions - at atomistic detail -
observed in molecular dynamics simulations of the interacting molecules in close proximity.
Furthermore, a method is introduced to provide realistic parameters for PBRD simulations.
In particular, it enables for tuning the microscopic parameters of PBRD simulations such that
experimentally obtained rates are reproduced in the dilute limit.
This provides a well-defined starting point to study effects such as crowding, which are common at
the cellular scale.
The second part provides new methods based on Markov chain Monte Carlo. These can be utilized to
speed up the generation of equilibrium samples from the Boltzmann distribution and thus enabling
faster computation of stationary observables.
In biological systems, it is often observed that high barriers in the free energy landscape
dramatically slow down the sampling process.
To speed up computations, a whole range of methods has been developed.
The latest advancements are facilitated by the recent rise of machine
learning research, which provides new promising tools to approach the sampling
problem from completely different angles.
In this spirit a new method is introduced that aims for directly proposing transitions between
regions of high populations in phase space, thus directly jumping over energetic barriers.
These long-range moves are proposed by a neural network trained to generate high-efficiency moves,
allowing for circumventing the slow transitions across energy barriers altogether.
A second proposed method is based on the recently developed Boltzmann Generators and aims to
combine these with parallel tempering in order to speed up sampling significantly. To this end, a
machine learning technique is employed which generates samples close to the Boltzmann distribution
at different temperatures. In both of these methods, the convergence to the correct distribution
is ensured by enforcing detailed balance.Simulationen molekularer Systeme haben zu bedeutenden Entdeckungen in der Molekularbiologie
geführt. Die hohe Genauigkeit dieser Simulationen ermöglicht es, biologische Prozesse auf
molekularer Ebene zu verstehen. In Verbindung mit Experimenten haben sie sich als
leistungsfähiges Werkzeug zur Untersuchung biologischer Funktionen erwiesen. Während die
Anwendungen für solche Simulationen zahllos sind, ist es in der Praxis aufgrund von beschränkter
Rechenleistung nur möglich, kleine Systeme zu simulieren. Das Erreichen biologisch relevanter
Zeit- und Längenskalen ist selbst für die leistungsstärksten Computer noch nicht möglich. Diese
Einschränkung wird allgemein als Samplingproblem bezeichnet. Da sich die Fortschritte in der
Hardwareentwicklung verlangsamen, ist die Nachfrage nach neuen Methoden, die es ermöglichen,
relevante Größenordnungen zu erreichen, groß. Diese Dissertation zielt darauf ab, neue Werkzeuge
bereitzustellen, die molekularen Simulationen helfen, biologisch relevante Größenordnungen zu
erreichen. Sie ist in zwei Teile aufgeteilt:
Der erste Teil stellt neue Methoden zur Berechnung von Raten in reaktiven Systemen vor, in diesem
Kontext bestehen diese z.B. aus Protein-Ligand-Bindung, Oligomerisierung oder
Protein-Protein-Assoziation.
Die erste Methode kombiniert Markov-Modelle von molekularer Kinetik mit
partikelbasierter Reaktionsdiffusion (PBRD), um die wechselwirkenden Moleküle auf gröberen Skalen
zu simulieren.
Diese Methode bewahrt die charakteristische Kinetik der Wechselwirkungen im atomaren Detail,
die in Molekulardynamiksimulationen der Moleküle in unmittelbarer Nähe beobachtet wird.
Darüber hinaus wird eine Methode vorgestellt, um
realistische Parameter für PBRD-Simulationen zu berechnen.
Insbesondere ermöglicht dies, die mikroskopischen Parameter von PBRD-Simulationen so abzustimmen,
dass experimentell ermittelte Raten im verdünnten Limit reproduziert werden.
Dies bietet einen wohldefinierten Startpunkt, um Effekte wie Crowding zu untersuchen, die auf
zellulärer Ebene üblich sind.
Der zweite Teil bietet neue Methoden basierend auf Monte-Carlo Methoden.
Diese ermöglichen es, das Erzeugen von Gleichgewichtsproben aus der Boltzmann-Verteilung zu
beschleunigen und somit stationäre Observablen effizienter zu berechnen.
In biologischen Systemen wird oft beobachtet, dass hohe Barrieren in der freien
Energie das Erzeugen von Stichproben dramatisch verlangsamt.
Um dies zu beschleunigen, wurden eine ganze Reihe von Methoden entwickelt.
Die jüngsten Entwicklungen in der Forschung zum maschinellen Lernen bietet neue
vielversprechende Ansätze, um das Sampling von stationären Observablen aus ganz anderen
Blickwinkeln zu betrachten.
In diesem Sinne wird eine neue Methode eingeführt, die darauf abzielt, direkt Übergänge zwischen
Regionen mit hoher Population im Phasenraum vorzuschlagen und damit energetische Barrieren direkt
zu überspringen.
Diese weitreichenden Vorschläge werden von
einem neuronalen Netzwerk erzeugt, das darauf trainiert ist, hocheffiziente Vorschläge zu
erzeugen.
Ein zweites Verfahren basiert auf den kürzlich entwickelten
Boltzmann-Generatoren und zielt darauf ab, diese mit Parallel Tempering zu kombinieren.
Dazu wird maschinelles Lernen
verwendet, um Proben nahe der Boltzmann-Verteilung bei verschiedenen Temperaturen zu erzeugen.
Bei beiden Verfahren wird die Konvergenz zur korrekten Verteilung durch die Einhaltung des
detaillierten Gleichgewichts sichergestellt
Multiscale molecular kinetics by coupling Markov state models and reaction-diffusion dynamics
A novel approach to simulate simple protein-ligand systems at large time- and
length-scales is to couple Markov state models (MSMs) of molecular kinetics
with particle-based reaction-diffusion (RD) simulations, MSM/RD. Currently,
MSM/RD lacks a mathematical framework to derive coupling schemes; is limited to
isotropic ligands in a single conformational state, and is lacking a
multi-particle extensions. In this work, we address these needs by developing a
general MSM/RD framework by coarse-graining molecular dynamics into hybrid
switching diffusion processes. Given enough data to parametrize the model, it
is capable of modeling protein-protein interactions over large time- and
length-scales, and it can be extended to handle multiple molecules. We derive
the MSM/RD framework, and we implement and verify it for two protein-protein
benchmark systems and one multiparticle implementation to model the formation
of pentameric ring molecules. To enable reproducibility, we have published our
code in the MSM/RD software package